-
Notifications
You must be signed in to change notification settings - Fork 116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OCPBUGS-33958: secret re-creation scenario for externalCertificate with active informer #614
OCPBUGS-33958: secret re-creation scenario for externalCertificate with active informer #614
Conversation
Skipping CI for Draft Pull Request. |
@chiragkyal: This pull request references Jira Issue OCPBUGS-33958, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. In response to this: Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
41a1750
to
f51a3f2
Compare
/jira refresh |
@chiragkyal: This pull request references Jira Issue OCPBUGS-33958, which is valid. 3 validation(s) were run on this bug
Requesting review from QA contact: In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
70121a3
to
f655bd2
Compare
@chiragkyal: This pull request references Jira Issue OCPBUGS-33958, which is invalid:
Comment In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
Thanks for doing the test. I also tried on a live cluster. I think there is a race condition happening while updating the status of the route. The secret handler status update is racing with the status update of the plugin re-trigger. The top level plugin has a lock : router/pkg/router/controller/router_controller.go Lines 210 to 217 in be831fa
which means all routes are getting processed serially, unlike secret handler. As a solution, I think the |
Signed-off-by: chiragkyal <[email protected]>
Signed-off-by: chiragkyal <[email protected]>
@alebedev87 @Miciah As per our discussion over sync call, I've updated the contention tracker to ignore the status update done by the secret manager. Please take a final look at the PR, and let's try to get this merged. Thanks! |
I did some testing on a real cluster with multiple routes using the same secret, and after the latest changes, the transition of the status were as expected and consistent through out the routes.
|
/approve Thanks for the testing, we should be good now. |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: alebedev87 The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
tide/merge-method-squash |
/label tide/merge-method-squash |
if ignoreIngressConditionReason.Has(a.Reason) || ignoreIngressConditionReason.Has(b.Reason) { | ||
return true | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We discussed today that this condition may be broader than necessary, but we couldn't think of a realistic scenario in which it would cause problems. However, we may need to revisit this condition if we ever observe any strange behavior with respect to status updates for routes with external certificates during, say, a rolling update of a router deployment.
// - If the external certificate has changed, it unregisters the old one and registers the new one. | ||
// - If the external certificate has not changed, it revalidates and updates the in-memory TLS certificate and key. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment is a little unclear. Presumably by "the external certificate has changed" or "the external certificate has not changed", you mean that the external certificate secret reference has or has not changed. That is, if the secret reference changes, you need to track the new referent, but if the reference is the same, you only need to check whether the referent secret's contents have changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you mean that the external certificate secret reference has or has not changed
Yes exactly, if the secret reference is changed to a new secret, we will track the new secret. And if the secret name is unchanged, we will check the contents of the secret.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for confirming my understanding! If the comment is unclear, we should try to make it clearer. If not in this PR, maybe in the next one that touches this code.
// - Handles secret recreation: If a secret is recreated (created after being | ||
// deleted), it fetches the latest route object associated with it using the | ||
// RouteLister and updates the route's status to trigger a full | ||
// re-evaluation of the route by the entire plugin chain. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there is a gap here. Consider the following scenario:
- Start with a router deployment.
- Create a route with an external certificate reference to secret.
- Delete the secret.
- Restart the router deployment's pods (for example, for a rolling update).
- Recreate the secret.
The new router pods will reject the route at step 4 and will miss the secret recreate event at step 5, right?
It might not be feasible to close this gap, but I want to raise this concern to make sure I understand correctly and to note that it is a potential issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is likely possible because the route pod is managing the lifecycle of the secret manager and the secret informers. Restarting the router deployment's pods will kill the previously stored watch, and new pods will not register the new watch because the secret won't be there. So, yeah, in this scenario, we will miss the secret recreate event.
It might not be feasible to close this gap, but I want to raise this concern to make sure I understand correctly and to note that it is a potential issue.
I agree on this, given the secret manager's dependency on the router pods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can see something related to this in the EP as well.
The router will bootstrap the secret monitor to ensure it can keep the certificate up-to-date. This means the router pod will maintain active watches for every secret that is referenced by a route.
Signed-off-by: chiragkyal <[email protected]>
@chiragkyal: The following test failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Just an observation: It occurs to me that using |
@@ -36,6 +37,7 @@ const ( | |||
// RouteStatusRecorder is an object capable of recording why a route status condition changed. | |||
type RouteStatusRecorder interface { | |||
RecordRouteRejection(route *routev1.Route, reason, message string) | |||
RecordRouteUpdate(route *routev1.Route, reason, message string) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a future clean-up, HandleRoute
could use this new method.
Excellent work! Thank you for your patience and effort with this PR! |
Finally!! Thank you very much for the approval. |
7a688b0
into
openshift:master
@chiragkyal: Jira Issue OCPBUGS-33958: All pull requests linked via external trackers have merged: Jira Issue OCPBUGS-33958 has been moved to the MODIFIED state. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
[ART PR BUILD NOTIFIER] Distgit: ose-haproxy-router-base |
[ART PR BUILD NOTIFIER] Distgit: openshift-enterprise-haproxy-router |
This PR:
Handles secret re-creation scenarios for externalCertificate where a previously deleted secret is added again.
Introduces a
deletedSecrets
map inRouteSecretManager
to track routes for which the associated secret was deleted. This helps differentiate between new secret creation and recreation of a previously deleted secret.Refactors secret handling logic in
generateSecretHandler
to address these changes.AddFunc
handles the secret recreations. Upon detecting a recreated secret, it revalidates the route, updates its TLS configuration, and triggers aModified
event for the next plugins.DeleteFunc
does not callUnregisterRoute()
, instead tracks deleted secrets.Improves the logic for handling
Modified
events for routes with externalCertificate to avoid blindly unregistering and re-registrations with the secret manager.UnregisterRoute()
, which leads to the stopping of the informer.Makes stopping of the informer self-contained to the
RouteSecretManager
plugin.RemoveRoute
(templateRouter
) should not callUnregisterRoute()
to avoid removal of informer whenRouteSecretManager
calls the next plugin chain withwatch.Deleted
event.externalCertificate
field is removedexternalCertificate
field is changed to a new secret name. [Old secret name is cached insideSecretManager
'sregisteredHandlers
map.Adds unit tests to cover the new logic and scenarios.
New Helper Functions:
validate()
to encapsulate the externalCertificate validation logic.populateRouteTLSFromSecret()
to update a route's in-memory TLS configuration from the referenced secret.unregister()
to remove the route's registration with the secret manager and clean up references indeletedSecrets
.Part of : https://issues.redhat.com//browse/OCPBUGS-33958